Combining Textual and Visual Thesaurus for a Multi-modal Search in a Satellite Image Database
نویسندگان
چکیده
Satellite images are more and more numerous and with increasing resolution. Hence it is urgent to develop tools able to (semi-) automatically process such images in order to efficiently exploit them. When searching satellite images by visual content, in most of the cases the query image resides in the mind of the user as a set of subjective visual patterns, psychological impressions or “mental pictures”. In this paper, we enrich the visual description by introducing to the user “the page zero” which is multiple visual patches summarizing the image database. Since there is no perfect description of visual content of images, most methods try to find a good compromise in balancing the image description between low and height level features. However, there exists an evident semantic gap between the demanding of user and the representation of low-level features [6]. Text-based methods are very powerful in matching context but do not have access to image content. The basic idea of our work is to combine high level and low level features to find a class of ‘similar’ images with similar keywords and with similar descriptors. In this paper, a novel content-based image retrieval framework is introduced. We propose three retrieval strategies by typing a keyword, selecting a visual thesaurus or by using the multi-modal description. We can prove that the multi-modal approach is able to retrieve complex concepts better than standards visual or semantical approaches taken separately.
منابع مشابه
Mining Visual and Textual Data for Constructing a Multi-Modal Thesaurus
We propose an unsupervised approach to learn associations between continuous-valued attributes from different modalities. These associations are used to construct a multi-modal thesaurus that could serve as a foundation for inter-modality translation, and for hybrid navigation and search algorithms. We focus on extracting associations between visual features and textual keywords. Visual feature...
متن کاملAutomatic textual annotation of video news based on semantic visual object extraction
In this paper, we present our work for automatic generation of textual metadata based on visual content analysis of video news. We present two methods for semantic object detection and recognition from a cross modal image-text thesaurus. These thesaurus represent a supervised association between models and semantic labels. This paper is concerned with two semantic objects: faces and Tv logos. I...
متن کاملMulti-modal query expansion for video object instances retrieval
In this paper we tackle the issue of object instances retrieval in video repositories using minimum information from the user (e.g., textual description/tags). Starting for a set of tags, images containing the object of interest are crawled from popular image search engines and repositories (e.g., Bing, Fickr, Google) and the positive and most representative instances of the object are automati...
متن کاملCombining Textual and Visual Cues for Content-based Image Retrieval on the World Wide Web
A system is proposed that combines textual and visual statistics in a single index vector for content-based search of a WWW image database. Textual statistics are captured in vector form using latent semantic indexing (LSI) based on text in the containing HTML document. Visual statistics are captured in vector form using color and orientation histograms. By using an integrated approach, it beco...
متن کاملCross-Modal Fashion Search
In this demo we focus on cross-modal (visual and textual) e-commerce search within the fashion domain. Particularly, we demonstrate two tasks: 1) given a query image (without any accompanying text), we retrieve textual descriptions that correspond to the visual attributes in the visual query; and 2) given a textual query that may express an interest in specific visual characteristics, we retrie...
متن کامل